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1 Introduction 

A cognitive approach to language asks both representational and computational 
questions. Our aim in our recent work, summarized in The Grammatical Basis 
of Linguistic Performance. — is to discover both what our knowledge of language 
is — a question about representation - and how that knowledge is put to use — a 
question about computation. We argued— and we'll reinforce that argument 
here that we can gain a deeper understanding of why natural languages are 
built the way they are by considering how the problems of efficient parsing 
and learning connect to the representation of grammars. We showed that if 
one is willing to make a few strong but natural assumptions about constraints 
on human parsing abilities and how grammars arc used as parsers, then one 
can show, in part, why locality constraints like Subjacency must be a part of 
grammatical descriptions. Our assumptions were these: 

• Parsing is deterministic, in the sense that once information about the 
structure of a sentence is written down, it is never retracted. This means 
that the information about a sentence is monotonically preserved during 
analysis. 

• Grammatical representations are embedded directly into parsers, without 
intervening derived predicates or multiplicd-out rule systems. This is an 

iT^ assumption of transparency (Berwick and Weinberg 1984). 

• The human brain is finite. 

Tin; assumptions about determinism and transparency are strong, but, as 
we'll see, natural. They are meant to be. Our explanatory punch works in 
direct proportion to the strength of the constraints: if wc adopt a system where 
anything goes, then we cannot explain why languages are built one way rather 
than another. 

Naturally —and fortunately- this leaves the system of assumptions open to 
refutation. In a recent article to appear in Language and Cognitive Processes 
(1985), Janet Fodor takes issue with both the linguistic details behind the the- 
ory of grammar we adopt and with the assumptions of monotonicity and trans- 
parency. We believe thai each of these criticisms falls short, and we'll survey 
just what Fodor says as well as our own position, but before launching into a 
bill of particulars, it's worthwhile to step back and survey the approach Fodor 
implicitly endorses. 

There's a style of theory construction in A.I. that might be dubbed "univer- 
sal simulation." The idea is to adopt the weakest possible ;^et of assumptions 
about a computational process, for fear of being wrong. A lampoon version 
goes something like this: (i) every cognitive process is a computational pro- 
cess; (ii) Turing machines can simulate any computational process; so (iii) I'd 
better adopt a Turing machine as a model of this cognitive process, because 
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otherwise I may miss something. That's sheer hyperbole, of course, hut sonic- 
thing disturbingly close to this lies behind the embrace of nondetenninism as 
a central feature of parsing models. The problem, as we specifically observe 
in onr book and as Fodor echoes, is that since nondeterministic computation 
subsumes deterministic compulation, one can always simulate the effect of the 
deterministic assumption simply by making the cost of nondetenninism very 
high. What Fodor fails to note is the flip side to this point: one can always 
get the functional effect of recovery from failed determinism, such as garden 
paths, by adding recovery procedures to deterministic parsers. So why all the 
fuss? Don't these two apparently opposed camps just merge info a gray middle 
ground? 

The difference is one of point of view and methodological stance. Forcing 
an essentially nondeterministic procedure to be deterministic by adding cost 
to backup violates the spirit of nondeterministic computation precisely in the 
same way that arbitrary backtracking would violate the spirit of determinism. 
We prefer to make the stronger- and more refutable.-- hypotheses about trans- 
parency and determinism. We'd argue that recovery from garden paths and 
near garden paths need not cause a deterministic parser to throw up its hands, 
but invokes quite particular, non-ad hoc reconstruction procedures that use the 
information built up about the parse in a deterministic way. More about that 
later. The important point here is that we adopt the determinism requirement 
j***\ as a basic article -a "leading idea,'' to be weakened only under duress and in 

quite limited, particular cases. In contrast, based on the same evidence, Fodor 
adopts nondetcrm.in.ism as a leading idea. These different positions lead to quite 
different ways of thinking about parsing. For someone who endorses nondetcr- 
minism, the hard part isn't figuring out how parsing gets done- that's easier, 
because we have more machinery at our disposal the hard part is figuring out 
what the constraints are and how to naturally enforce them. We must now be 
able to say why parsing isn't done some other way that is just as easy to en- 
code using the extra machinery of nondetenninism. Plainly the burden of proof 
hero falls on Fodor's shoulders; her position is the weaker one. One example 
of ibis point should suffice. Fodor argues that adding an extra memory cell or 
its functional equivalent to a transition network parser (e.g., a hold cell) makes 
parsing easy. Therefore, she concludes, it should be added. More strikingly, 
she comments: '"B[erwickj and Wfeinborg] simply have to stipulate that their 
parser has no such facility." (page 50; our emphasis). But since when does 
one have to stipulate the nonexistence of additional machinery? As Marcus 
(1980:140) says on this point, "What demands explanation and motivation is 
why a given facility is included in the model .... Thus, there; is no reason to 
explain why a mechanism of only limited power has been implemented if it can 
be shown that it is enough to the job that is required." What is more, by stick- 
ing to more restricted machinery, we can actually explain some of the structural 
characteristics of natural languages. 
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Of course our loading idea may bo incorrect. Then wo will be led, regret- 
tably, to nondetcrniinism, to nontransparency, and perhaps beyond. We say 
regrettably, because then we will bo in a weaker position. Once the Pandora's 
box of unlimited notidcterministio computation is opened, wc can nail it shut 
only by importing constraints from other domains. Again, this may be possible; 
we cannot rule it out. Fodor hints at constraints on grammar size having to 
do with parsing/loarnability but we'll see these arguments lack support. Sim- 
ply put, the search space of nondotenninistically- and nont ransparently- based 
theories is much vaster. We prefer to start with the much smaller world of 
determinism and work outwards. 

We were well aware of this difficulty in our book. That's why we took great 
pains to distinguish between two versions of nondeterminism: (1) "true" nonde- 
terminism in parsing, where all interpretations are carried along simultaneously; 
and (2) "backtracking" nondeterminism, where all nondetenninistic alternatives 
are explored one at a time. We carefully observed that our functional argu- 
ments bifurcating deterministic and nondetermiiiistic parsing applied only to 
true nondeterminism. By thinking about this contrast, we were led to quite 
specific predictions about locality constraints in natural languages — predictions 
that are, as we show in our book mid as we'll underscore below, confirmed. 

This much said, we can turn to Fodor 's particular objections. As we noted 

earlier, they fall into two parts: objections to our predictions about which con- 

/""""n. structions will obey Subjaecncy and which will not; and objections to our three 

key assumptions. As to the first sot of objections, we'll sec that while Fodor'a 
more refined observations about what constructions obey Subjaecncy and what 
ones do not are correct, they in fact support, our "loading idea" of determinism. 
The second set of objections center on the assumptions of determinism and its 
relationship to efficient parsability, our "modular" parser design and the di- 
rect embedding of grammatical representations in the parser, and the restricted 
space for writing down grammatical operations. 

2 Determinism makes the right grammatical 
predictions 

Turning first to the grammatical predications of our model, Fodor's interest- 
ing critique argues that our approach is both too strong and too weak. It is 
too strong in that our approach predicts parasitic gaps to be subject to Subja- 
ecncy. This is because their deterministic detection requires scanning the left 
context. Nonetheless, wc claimed that the distribution of these categories was 

'To show this, Fodor cites examples where in order to know whether an adjunct clause with 
an ambiguous verb can take a 'parasitic gap object, we must see whether the matrix clause 
contains a wh element in COMP. The relevant examples are contrasted in (a) and (b): 

(a) What did you cook without eating? 
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not. governed by Subjacency. 

Further, our approach is too weak because it cannot distinguish a subset 
of gapping constructions that Fodor shows obey locality from a class that does 
not. 2 

First, we will show that Fodor's criticisms, while correct, deal with non- 
crucial assumptions of our analysis. The assumptions that replace them are 
fully compatible with our theory and the data cited by Fodor actually support 
our analysis in interesting ways. 3 

2.1 Parasitic gaps 

The most important thing to notice about our claim that parasitic gaps are not 
subject to Subjacency is that it is false. Chomsky (class lectures, 1984) provides 
the following examples showing that these constructions are in fact subject to 
this constraint: 



1. Who,- did your read a book about e,- to e,? 

2. Which man,- did you interview e; without reading up on e t ? 

/"*\ *3. Which man,- did you interview e,- without reading [np [the filejy [g you 

made ey on e,]]? 

In (1), both gaps are subjacent both from the complementizer, and from each 
other. This is shown by both (4) and (5), where overt movement from both the 
parasitic and regular gap positions is acceptable. 

4. Who,- did you read a book about e,-. 

5. Who,- did you read the book (that Mary bought yesterday) to e,-. 



(b) Can you watch TV without eating? 

In the second example, catintj is unambiguously an intransitive verb, because there is no wk 
movement in the matrix clause. 

2 Bcfore turning to these specific ca-ies, let us dispense with one of Fodor's more general 
criticisms: namely, since the solution adopted does not solve all cases of parsing ambiguity, 
it. is dubious from the evolutionary perspective. In fact, this kind of compromise is typical 
of what one finds in natural selection. The evolutionary literature abounds with cases 
where selection has opted for solutions that either solve part, of an evolutionary problem 
or created other problems. (See footnote 10 of Berwick and Weinberg 1982.) Indeed 
Gould (1983) cautions us against adaptationisl.s who theorized "a world of perfect design, 
not much different from that 'concoted' by 18th century natural theologians who 'proved' 
God's existence by the perfect architecture of organisms ... we do not inhabit a perfected 
world where natural selection ruthlessly scrutinizes all organic structures and then molds 
them for optimal utility." (1983:155 106). 

3 The following is a very condensed version of Weinberg (forthcoming). 
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Chomsky uses the contrast, in (2) mid (3) to argue that parasitic gaps are 
bound to empty operators and are licit only if they are subjacent to these 
operators. These empty operators are interpreted as marks of predication and 
so must appear at the head of the adjunct clause. 4 Put in terms of our parsing 
model, we can use the presence of the overt operator to signal the presence of the 
''real" gap. The placement of the empty operator is governed by the independent 
principles of A binding. The presence of the empty operator, in turn, can be 
used to signal the presence of the parasitic gap, if it is in a subjacent position. 5 
In addition, Chomsky assumes that the theory of government interacts with 
the theory of bounding in that only ungoverned nodes count for bounding. 
Therefore, we will assume that the empty operator is subjacent to the real 
operator. This analysis predicts that (3) is bad because, as a sign of predication 
between the relative clause and the head of the complex NP, the empty operator 
inside this relative must be bound to (coindexed with) the head. Coindexing 
the parasitic gap to this operator as well will result in an ill-formed structure, 
because quantifiers cannot be bound to two variables, as in (6). Neither the overt 
operator at the head of the sentence, nor the empty operator at the head of the 

4 Alternatively, following Aoun and Clark (1985), wc can claim that empty operators count 
as A anaphora and so obey the locality conditions that apply to this class. See Weinberg 
(forthcoming) and Aoun, Hornstein, Lightfoot, ;md Weinberg (forthcoming) for details. 

'This contrasts with Chomsky (1982) where parasitic gaps are considered underlying PttOs. 
r^^ H ro-.ly (1083) provides independent arguments showing this account of the distribution of 

,- -iHc gaps is inadequate because it relies on the so-called functional definition of empty 
i.\'.< L-ories. In addition, the earlier analysis would obviously not predict the observed distri- 
bution of the data, since PHOs are typically not bound by operators, empty or otherwise. 

c Chomsky must argue that all ungoverned nodes (not just NP or S) are bounding with 
respect to Subjacency. This is because he wants to ride out direct movement from an 
adjunct as in (a): 

(a) *Which article did John read a book before filing 

In order to rule this out using Subjacency, he must claim that both PP and S cottnt as 
bounding nodes. Moreover, he must use Subjacency to rule these cases out, becausq this 
is the only S-structurc condition available to him and the bounding constraint in these 
constructions is an S-structurc phenomenon, as shown by the grammatically of (b): \ 

(b) Who read a book before filing which article? 

In Weinberg (forthcoming) and in Wahl (forthcoming) it is argued that the requirement 
of lexical proper government in Chomsky's ECP actually applies a the level of 'phonetic form 
(PF). This allows us to rule out a case like (a) by elaimingthat the trace in the COMP of 
the adjunct is not properly governed, as shown in the structure (c): 

(c) *[g Which article, [did John read a book [before [g e< [PRO filing e,j][] 

Therefore, we can maintain the position that only S and NP count, for the bound- 
ing system. Thus the empty operator is subjacent to the real operator in parasitic gap 
constructions. 
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adjunct arc subjacent to the gap, and so they cannot license it. Therefore this 
structure is ruled out. This contrasts with (2), where every trace is subjacent 
to the operator that licenses it, as shown in (7). 

*C. Which man,- [s did you [vPj interview e,][pp without [0r2j [s PRO 
reading [np the file,- [s< OP,- [s that you made on,- c,-]]]]]]] 

7. Who,- [s did you [yp interview e,-][pp without [g< OP, [PRO reading up 
one,-]]]]? 

Thus in fact, Fodor is correct in claiming that our analysis should predict that 
parasitic gaps are governed by Subjacency and we were mistaken when we 
claimed in our book that it did not. But we were all incorrect in believing 
that the constraint did not hold. Assuming that we can show that the creation 
of empty operators causes no problems for a deterministic system, we can use 
their presence to license parasitic gaps in the appropriate structures. Thus we 
can make the parsing model predict the properties of this construction in a 
straightforward and independently motivated way. It is important to note at 
this point that we are not, changing assumptions in an ad hoc way simply to 
model the facts. The problem with our first: attempt was that we did not follow 
the logic of our predictions clearly. The model actually predicts that parasitic 
gaps should be governed by Subjacency, as Fodor notes in her article. In the 
next section, we will show that the model is non-ad hoc in other ways, in that it 
or something like this model is needed to solve a general parsing problem that 
is independent of the determinism issue. 

In this section, we present an algorithm to create empty operators that is 
also compatible with a deterministic approach. Note that the case of empty 
cpe.-v.iora in adjuncts is similar to the case of factive Noun Phrases cited by 
Fodm in her criticism of Marcus. As in factives, the presence of the overt 
operator makes parasitic gaps possible in adjunct positions, but it does not 
make them obligatory in these structures. Consider (8)- (10). 

8. Who did you meet without greeting. 

9. Who did you meet without greeting him. 

10. Who did yoti meet without clearing the rendezvous with security. 

In a case like (8), the parser must place an empty operator in the comple- 
mentizer of the adjunct phrase in order to bind the empty parasitic object of the 
verb greeting. In (9) and (10) by contrast, we do not want to place an empty 
operator in this position, because there is no parasitic gap in the adjunct for 
the operator to bind. 7 In (9) the parasitic gap is filled by a i>ronoun and in (1), 

7 If these operators are available at all stages of comprehension then the fact that the empty 
operator has no variable to bind should make the sentence as bad as (a): 
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there is no corresponding gap position at all. Because; of the possibility of suc- 
cessive cyclic movement however, the parasitic gap can be indefinitely far away 
on the surface from the empty operator position. A deterministic parser with 
limited lookahead will not be able to wait for the disambiguating right context. 8 
Therefore, there will be certain cases it will incorrectly place an empty operator 
in the adjunct's COMP. 

Fodor implies that these facts pose a problem solely for deterministic parser, 
suggesting that a nondeterministic solution is called for. In fact, the dctermin- 
istic/nondetermiuistic issue is beside the point. If the distinction is between 
a deterministic parser and a nondeterministic parser that backtracks (Fodor's 
choice), then both will have problems because they both at least superficially 
predict that such cases cause people to have noticeable difficulties in compre- 
hending these sorts of sentences. But none of (8) (10) arc difficult to under- 
stand. 

The nondeterministic parsers with backtracking that Fodor cites divide cases 
of possible parser error into three types: 

(a) Cases that are locally ambiguous but cause the parser no difficulty. Here 
it is claimed that either the backtracking needed to transform an incorrect 
false start into a correct, analysis is so minor that it is not associated with a 
computational cost, or that these parsers use an exact analog of a deterministic 
parser's local buffer solution and thus always make the right choice. Some 
examples of this kind of case are given in (11). 

11a. John believes Bill. 

lib. John believes Bill is a fool. 

Even if the parser mistakenly hypothesized that the subject of the embedded 
infinitival was the direct object of the verb believe, the backtracking needed to 
insert the infinitival S marker between it and verb is minor and a nondeter- 
ministic parser might be able to correct its mistake in a way that is relatively 
cost-free. 9 

In contrast, there are cases that require more extensive backtracking over 
essentially unbounded distances. These cases can be divided into two types. 

(b) Cases for which people register a strong preference for one of the possible 
analyses (even when pragmatic biasing points to the other choice, but where 

(a) Who «.lid John meet Mary? 

8 The requirement that lookahead be limited is crucial because, as Marcus (1980) notes, a 
deterministic parser with unlimited lookahead could well turn out to be able to simulate a 
nondeterministic machine. 

°Note that this is true even for a deterministic parser, since wc need only add a new piece 
of information. See the next section for a related example. 
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both leadings fire eventually available). An example of this case is shown in 
(12), where, as Fodor mentions, there is an initial preference for the reading 
where who is taken to be the subject of an embedded clause. 

12. Who,- did the little girl beg to sing those stupid French songs (for) e t ? 

(c) Cases of conscious garden paths where one reading is difficult. These are 
cases where the alternative lias to be pointed out, even if it is the oidy reading 
resulting in a grammatical sentence. These include the classic sentences as in 
(13): 

13. The horse raced past the barn fell. 

The processing load here might be compatible with a backtracking approach 
if it is assumed that, backtracking over long distances is computational costly. 
(It ran often be difficult to assess these effects in a backtracking model; see the 
next section.) The extra burden imposed by true garden paths is a complex 
effect that is partly lexical, partly structural, and exacerbated by distance (in 
terms of number of alternative, but unconsidered pathways). 

Cases like (8) (10) cause problems for the backtracking approach because 

they break the association between the extent of backtracking necessary to cor- 

/"""""N rect false- starts and perceived sentence complexity. None of the examples in 

(8)— (10) produce processing complexity. This shows that there is not even a 
preference for adjuncts with or without parasitic gaps. Whatever the first hy- 
pothesis of the (deterministic or backtracking) parser- whether it inserts an 
empty operator in the adjunct's complementizer or not one of the structures 
is incorrectly predicted to be difficult to process because of extensive backtrack- 
ing from the site of the disambiguating parasitic gap or end of the adjunct 
nre.J"d to correct the mistake. (14a) and (Mb) show that no extra processing 
complexity is observed even in cases where the disambiguating right context is 
very far away from the point where the decision about whether to insert an 
empty operator must be made. 

14a. Who did you search for without telling Sue to convince Bill to ask 
Harry to come with you? 

14b. Who did you search for without telling Bill to ask Sue to inform 
Harry that you would meet? 

It seems then that these kind of sentences are problems for both deterministic 
and nondeterministic (backtracking) parsers. We could solve them if we could 
design an algorithm in which the semantic component simply didn't interpret 
empty operators unless they were eventually bound to elements in argument 
positions. Since these elements have no phonetic content, if they received no 
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semantic interpretation, it would be as if these elements never existed. 10 In 
that, case we could insert the empty operator in all sentences, but we would be 
sure to be right because an unbound empty operator would simply be ignored, 
because it is invisible. In fact the two stage parsing model discussed in our book 
provides jtist such a mechanism. 

We argued on conceptual and psycholinguist.ic grounds that the natural lan- 
guage processor was a two stage mechanism. The first stage dealt with tree 
expansion and the second dealt with indexation. In addition to having a dif- 
ferent function, the second stage worked on a different representation. During 
the first stage, the completion of a category signaled the parser to shunt the 
category's daughter into a separate stack, which we called the Prepositional 
Node Stack (PNS). The intuition behind this shunting was that once a cate- 
gory's thematic role was established from its position in the syntactic tree, the 
parser wouldn't need to retain many of the details of syntactic structure. We 
showed that elements in the same c-command domain arc not put in the PNS 
until all categories in the domain are complete. This algorithm allowed the 
parser to correctly compute c-command relations between categories. This was 
crucial since these relations govern the application of the binding operations 
on the previously expanded tree. Pursuing the intuition that the PNS was a 
representation concerned with purely semantic aspects of the interpretation, we 
placed a semantic visibility condition on the categories appearing in this eom- 
/•""""N ponent. We claimed that to be interpreted by the semantic component (PNS), 

a category had to have semantic features. These were the features that allowed 
a Noun Phrase to either denote an individual or a set of individuals of allowed 
a quantifier to delimit a scope. 11 Assuming a category had such features it 
would be given a "referential index" and be visible in the PNS. If a category did 
not intrinsically have such features, it could obtain a referential index by be- 
ing linked to an elcmcjit that did. 12 Given the shunting procedure, an element 
would have to be in the same c-command domain as its antecedent in order 
to receive a referential index before being shunted into the PNS. If an element 
did not; receive an index before shunting, it would become invisible and receive 
no interpretation. This allowed us to provide a principled explanation for the 
fact that grammatical conditions specifying c-commanding antecedents seem to 



'"An alternative would obviously bo to come up with aji analysis that did not posit empty 
operators in these and related eases. Such an account is difficult to conceive of, because we 
would also have to account for the subjaceney effects that these constructions exhibit. By 
this we do not mean corning np with an alternative functional explanation for Subjaceney 
in these cases. We mean allowing the parser (or the grammar) to distinguish those cases 
that are grammatical from those that do not obey the constraint. 

11 Examples of categories with intrinsic semantic features are proper names like John, pro- 
nouns like him wh phrases like what or which man. 

12 Categories that have no iutrinsic semantic features and so can receive referential indices 
only by linking are bound anaphora like each other or herself , empty NP and wh traces, and 
certain non-wh quantified expressions. See Weinberg (forthcoming) for details. 
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api>ly only to categories with no independent referential status. Chomsky 
(1981 and 1984 class lectures) has suggested that association with a thematic 
(theta) role is also a necessary condition on visibility for semantic interpretation 
roles. We will adopt Chomsky's suggestion and state the combined condition 
on visibility as follows. 

15. (Visibility Condition) To be visible in the PNS, an element must 
be associated with a theta role (either by occupying a theta position or 
binding an element in a theta position) and must have referential features 
(features that either designate an individual or set of individuals or that 
delimit a range). 

We will now show that the independently motivated shunting procedure and 
visibility conditions give an account of empty operators that explains why they 
cause no processing difficulties. 

Let us reconsider sentences (8)- (10). In (8), the parser recognizes that part 
of the sentence is an adjunct phrase. This signals the possibility of a parasitic 
gap in the subsequent structure. The parser therefore inserts an empty operator 
in the COMP position, as shown hi (16): 

16. Who,- did you meet e,- without [w OPj . . . 

If the parser subsequently finds a gap position in a subjacent domain, it can 
create a trace and bind the operator to it, thus associating the operator with a 
theta position, as in (17). 

17. Who,- did you meet c, without [OPj- [g greeting e^-]] 

Before shunting into the prepositional node stack, the operator must locate 
an antecedent in the c-command domain with a referential index. If it does not 
find one, then neither it nor its trace will be interpreted, because even though 
they are associated with a theta role, they are not associated with a category 
that delimits a range. In this case the overt operator who is present in the 
c-command domain, so both the empty operator and the trace can receive the 
category's referential index (i) and so be interpreted in the PNS. 

Compare this to (18). In (18) below, the parser will also detect an adjunct. 
It will not detect an overt operator, and so no empty operator will be cre- 
ated. Since there is no empty operator, no parasitic gap will be created in this 
structure. 

18. Did you watch the movie without [g- OPj [s eating ]] 



13 Sec Berwick au<l Weinberg (1984, pp. 173-182) for the conceptual .ugumeiils and Weinberg 
(forthcoming) and Weinberg and Garrett (forthcoming) for psycholhiguistic results and 
additional consequences of this approach. 
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In cases like (9) and (10) above, the adjunct and overt operator again triggers 
the creation of an empty operator. Since there is no gap in the adjunct phrase, 
the operator is not associated with a theta role. Therefore, even though there is 
an overt operator to link with, the empty operator does not meet the criterion 
for visibility at PNS and so is not interpreted. 14 Since empty operators are not 
interpreted unless both conditions on visibility are met, a deterministic parser 
can always create 1 these categories because they can never force it to simulate 
nondeterminism either by backtracking or parallelism in order to correct for 
past mistakes. Note that this solution will only work for empty operators. Lex- 
ically specified elements will receive a phonetic interpretation but no semantic 
interpretation, a situation that will lead to unacccptability. An empty element 
with no semantic features, however, is neither semantically nor phonetically 
interpreted and so simply plays no role in the interpretation of the sentence. 15 
The astute reader will have noted an apparent problem created by this so- 
lution. Why, one might ask, if empty categories can become invisible at later 
stages of interpretation, must we cue their creation to the presence of overt op- 

14 This approach will also handle empty operators in tough movement, topicalization, relative 
clauses, and the factive NPs that Fodor discusses in her criticism of Marcus. As should be 
obvious, since all these structures also involve predication between a phrase and a head, 
topic, or adjective phrase, exactly the same logic applies. See Weinberg (forthcoming) for 
details. 

f jm ~\ 15 Throughout this account, we have assumed, contra Chomsky, that the empty operator is 

subjacent ot lie real operator. However, this assumption is not crucial, and remains to be 
verified (or falsified) by some fairly subtle empirical facts. To show Shis, let, us assume (with 
Chomsky) that empty operators are not in fact, subjacent to real operators. Then we must 
predict that the possible presence of an empty operator is queued solely by tin' presence of 
the adjunct structure. So in a case like (a), 

(a) Did you catch a fish without eating? 

the parser couldn't mistakenly output a structure like (b): 

(b) Did you catch a fish ( PP without [ OPy [PRO eating e 3 -]]] 

The empty operator and parasitic gap, having no referential indices, would disappear 
from the semantic component's representation. However, the case features on the parasitic 
gap would make it visible in PP. In fact, some speakers report an initial bias towards 
treating mt as a transitive verb in these structures, and thus say that the sentence sounds 
unacceptable. This bias interestingly does not cross over to structures where this verb is 
not in an adjunct: 

(c) Did you think that Harry told Mary that he expected to eat? 

If these sentences reflect true biases, then an algorithm based on Chomsky's definition of 
Subjacency would seem more appropriate. Such an account would be fully compatible with 
our approach at the conceptual level. We have noted cases in our book where, in order to 
be specifiable using terms licensed by the grammar, the Subjacency condition is in some 
sense "stricter" than the parser's needs. Here we have a case where a parser whose rules 
are written using the grammar's predicates will sometimes make mistakes. The prediction 
is that people will make the same mistakes. The facts here, however, are quite subtle, and 
since either alternative is compatible with our approach, we leave the question of whether 
to place the Subjacency requirements on the empty operator open. 
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orators? The cases that motivated the account in the first place were those in 
which the local subeatcgorization of a verb was indeterminate. Before positing 
an empty element after such a verb, we claimed that we had to make sure that 
an actual operator was present in the previously analyzed structure. However, 
given our present approach, one might be tempted to argue that if a verb that 
can be optionally transitive turns out to be used intransitively in a given struc- 
ture, the gap will simply not be associated with an operator and so become 
invisible in the PNS. This seems to dash the motivation for restrictions on left 
context, crucial for the functional motivation of Subjacency in the first place. 
But it is only elements with no phonetic features that can escape unaceeptability 
if they are not semantic;) lly interpreted. Since wh elements have case features, 16 
they will be visible in the phonological component. This makes certain pre- 
dictions about the applicability of Subjaceny to NP movement. As noted in 
Lasnik and Saito (1984). all the cases where we seem to need Subjacency to rule 
out unacceptable NP movements are actually also ruled out redundantly by the 
Empty Category Principle. Under our approach, we predict that NP movement 
should not be governed by Subjacency, thus ruling out this redundancy, always 
a welcome result. 18 

Looking at the distribution of parasitic gaps from the parsing perspective 

allows us to supplement Chomsky's analysis in important ways. It; allows us 

to derive the fact that parasitic gaps must be licensed at S-strueturc. That is, 

/**\ we derive as a theorem the fact that quantifiers and wh operators that move to 

COMP or some other pre-S position at; LF do not create acceptable parasitic 
gap structures, as shown by examples ( 1 9a) and (19b). 

*19a. [s You [yp [vp mc -k who,] [pp without greeting e,-]]] 
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16 Sce Chomsky (1981) for justification of this assumption. 

I7 Sce Aoun and Lightfoot (1984) for discussion. 

18 Scc Weinberg (forthcoming) for details. Note that the non-government of NP movement 
by Subjacency reinforces the point made in Berwick and Weinberg (1984) — namely, that 
Subjacency governs a natural class from the parsing perspective. The example just given 
shows that Subjacency only governs a subset of the movement constructions, the gapping 
examples discussed later on in this section show that Subjacency governs a subset of the 
deletion constructions. From a grammatical viewpoint, this is an entirely unnatural result. 
This approach .'dso makes sense of pome preliminary results reported by Frazier (1984 
Ncls conference) and cited by Fodor in her article. Frazier claims that eye movement tasks 
suggest that subjects try to fill gaps using operators that are not subjacent to them, if the 
verbs governing the gap position are strongly subcatcgorized for direct objects. The cases 
are like those in (a): 

a. *What; did [the girl [g who won e* receive pj] 

Given our approach we might claim that the gap inside the island is created on the basis 
of the empty operator in the COMP of the relative COMP. The fact, that subjects seem to 
look back to the overt wh element is compat ible with our approach if we claim that this is 
the result of the attempt to bind this operator (an operation not governed by Subjacency) 
to the overt operator. 
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*19b. [Everyone [vp [vp met someone,- ][pp without greeting e,-]]] 

We know independently that that parasitic gap constructions are not licit in 
the real gap occurs in Subject position. In addition, if our analysis is correct, 
the overt operator must occur in a c-commanding OOMP. As mentioned, the 
c-commmid requirement is ensured by the shunting design of the parser. If an 
element, does not c-command a category it is not visible; to it and so cannot 
be used to create that category as we expand the parse tree. Neither the wh 
element, nor the quantifier in (19a) or (19b) c-commands the adjuncts contain- 
ing the- parasitic gaps. Given the above account, there will be no binder to 
give referential features to the empty operator in the COMPs of these adjuncts 
and thus neither they nor their traces will be interpreted in the PNS. Given 
that the input for parsing decisions is the S-structure of the sentence, the subse- 
quent movement of a category to a c-commanding position at a post S-structure 
level cannot help the parser decide how to expand the parse tree. Our pars- 
ing theory can derive both the fact that Subjacency is an S-structure property 
and the Subjacent government of parasitic gaps along with their licensing at 
S-structure— the central properties of the construction. 

2.2 Gapping constructions 

Fodor's next criticism deals with our analysis of gapping. She is correct in claim- 
ing that our treatment does not distinguish the subset of gapping constructions 
that obey bounding conditions from those that do not. As she points out, es- 
cape from bounding correlates with the appearance of an auxiliary marker in 
the pregap position. (20) and (21) illustrate. 



20a. Mary fishes in the ocean and Harry in the sea. 

*20b. Mary fishes in the ocean and I think Harry in the sea. 

21a. Mary has fished in the ocean and Harry has in the sea. 

21b. Mary has fished in the ocean and I think Harry has in the sea. 

In our previous analysis we claimed that bounding was expected in gapping 
constructions because the complements of the gapped verb had to be correctly 
attached in the VP internal or external position. Correct attachment depends 
on the properties of the verb. Since an overt verb is not available to direct 
the parser in a gapped constituent, we predicted that deterministic attachment 
of these complements required a look at left context (some previous conjunct 
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containing an overt verb). Given the usual requirement of bounded access to 
this left context, the bound constraint on these constructions followed. Since the 
parser faces the same problem in both types of gapping constructions, Fodor ia 
right in claiming that we are incorrectly led to the conclusion that the presence 
or absence of an auxiliary marker in the gapped constituent should not influence 
the application of the constraint. Therefore, in countering this argument we 
must show that complement attachment of PPs does not require access to left 
context, but that there are other properties of gapping constructions that require 
this access only in cases where no overt auxiliary precedes the gapping site. 
Let's start with the second point first. Consider the following examples. 

22. [ s I consider [ s Bill [ V p to be a fool]]] 

23. [s I consider [ s Dill [ NP a fool]]] 

In (22) the embedded clause is an infinitival with a VP predicate and in (23) 
it is a small clause with an NP predicate. 20 The head of the VP predicate in 
(22) can be gapped, as shown in (24). 

24. [ s John believes [ s FRED is a FOOL] and [s HENRY [ V p [y0] AN 
IDIOT]]] 21 

Fodor (1975) has shown that (24) actually involves two different deletion 
rules. Main Verb Deletion eliminates the verbal he form and Tense Deletion 
removes the associated tense. Cast in parsing terms, the interpretation of the 
second conjunct involves expanding the parse tree with both ;m empty tense 
morpheme and an empty verb. Note however that the surface string in the sec- 
ond conjunct is locally ambiguous and could be expanded as a gapped structure 
or as a small clause. If we chose the small clause alternative, the sentence would 
be ruled out because believe does not take small clause complements, as shown 
by (25). 

*25. [ s I believe [ s John [ NP a fool]] 

The only way that we can determine the proper expansion of the second 
conjunct in a case like (24) is by rescanning the left conjunct. Again we have 
a case where a deterministic tree expansion involves left context examination. 

20 Tbe structure of small clauses ia the subject of some controversy. Chomsky (1981) following 
Stowcll (1981) argues that embedded categories like Dili u fool formed sentential comple- 
ments (hi this case with the structure [ A p [jfp John] a fool]). Williams (1983) argues that 
these categories do not form a constituent and that they are properly analyzed as [. .. [np 
John] [np a fool). . .]. Hornstein and Lightfoot (forthcoming) argue against Williams's anal- 
ysis and in favor of a modified version of the Chomsky --Stowcll approach. The only point 
relevant to this argument, however, is that the predicates of small clauses are not VPs. 

2 'We follow Fodor's convention of indicating the placement of heavy stress on a word by 
capitalization. 
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Given our usual logic, we must ensure that we will never have to look at an 
unbounded stretch of left context. Therefore, we predict that cases involving 
tense deletion should obey bounding- exactly what Fodor demonstrates. As 
additional evidence, consider (2Ga). If the parsing version of tense deletion is 
governed by bounding, then we predict that the small clause analysis will be 
the only permissible expansion of the embedded clause in the second conjunct. 
Since believe doesn't take small clauses we predict the unaecept ability of the 
structure, in contrast with the acceptable (26b). 

*2Ca. 1 think Fred is a fool and Sue believes John stupid. 

26b. I think Fred is a fool and Sue believes John is stupid. 

In contrast, cases that involve only main verb deletion will never create the 
same kind of ambiguous situations. This is because the presence of an overt 
auxiliary unambiguously signals that a verb phrase must follow. One never 
finds overt auxiliaries in small clauses. Since the parser will always be right if 
it expands the phrase after an overt auxiliary as an empty headed VP, it will 
never have to scan the left conjunct. In a case like (27) it simply uses the locally 
available overt auxiliary to decide about subsequent expansion of the tree. 

j/*\ 27. John has fished in the ocean and Bill has in the sea. 

Since we never need to examine left context when the auxiliary remains 
in the surface string, we do not expect Main Verb Deletion to obey bounding 
constraints. This is in fact what Fodor observes. 

This account has another virtue. The information provided by the left con- 
text to resolve the ambiguous cases will be available at the time the parser is 
confronted with the ambiguous material of the second conjunct. This contrasts 
with our previous analysis where, as Fodor correctly notes, proper identification 
of a verb's snbeategorization and selectional properties demands access to the 
actual verb of the previous conjunct. Unfortunately, our parser will have al- 
ready shunted this material into the PNS representation. Our parser shunts at 
the end of c-command domains leaving only immediate daughters of the com- 
pleted constituent available as information for future parsing decisions. This 
is no problem for our new analysis because we distinguish small clauses from 
gapped constituents merely by looking at previous conjuncts for the presence of 
a tensed auxiliary. If we treat sentences as maximal projections of INFLcction 
(Chomsky 1981) and if we assume that, lexical information about the head of a 
category is projected from that head to its most maximal projection, then the 
relevant information will percolate up to the highest S node on the tree and 
thus be available to the parse for expansion decisions. 22 



"Projection to the most maximal projection is supported by movement of postvcrbal Subjects 
in Italian. Since these elements occur in structures like (a) we must insure that the verb 
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Consider again a structure like (24), repeated as (28), with irrelevant details 
omitted. 

By the time the parser reaches the locally ambiguous second conjunct, the 
first conjunct will have been shunted to the PNS. Thus information contained 
in this conjunct will not; be available for decisions about tree expansion. This 
causes no trouble because we see that the tensed character of the first conjunct 
can be read off the highest INFL projection that c-commands and is boundedly 
far from the INFL (INFL') of the next conjunct. If the first conjunct was a small 
clause, then the O-inflection would also percolate np to the maximal S node. This 
is all the information the parser needs to correctly expand the tree of the second 
conjunct. If the previous conjunct contains a tensed or infinitival inflection, the 



can transmit its features to the maximal VP in order for the trace of the postvcrhal Subject 
to satisfy the conditions on proper government imposed by the ECP. 
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parser expands the conjunct as a gapped structure. If the previous conjunct 
contains a inflection, then the parser expands the ambiguous structure as a 
small clause. This analysis makes the interesting prediction that if Ss instead 
of S's are conjoined, tense deletion should be unacceptable. Since S is not a 
projection of INFL, conjunction of Ss would not allow percolation of information 
beyond the first conjunct in a structure like (28). 23 Since expansion as a tensed 
structure is conditioned by the presence of an overt auxiliary in the previous 
conjunct, the parser will not be able to apply the tense deletion rule. This is 
confirmed by comparing (29a) and (29b), where we have conjoined S's, with 
(29c) and (20d), where we have conjoined Ss. 

29a. That Frank woidd hit Sam and Bill would hit Harry surprised me. 

29b. That [ s Bill would hit Sam] and [ s Frank [ INPL . (0) [ V p [ V ]Harry] 
surprised me]] 

29c. That Frank would hit Sam and that Bill would hit Harry surprised 
inc. 

*29d. [g [g That [ s Frank would hit Sam] and [g- that [ s Bill [i NF l<0][v0] 
/"""N Harry]] surprised me.]] 

As predicted, Main verb deletion can apply in both conjoined S and Ss as 
shown in (30). 

30a. That Frank would hit Sam and Bill would Harry surprised me. 

30 b. That Frank would hit Sam and that Bill would Harry surprised me. 

Thus this approach correctly distinguishes the two cases of gapping. 

Returning to our first problem, we must show why the problem of comploment- 
vs. adjunct attachment, wliich applies in both types of gapping, does not force 
the parser to look at left context, thus incorrectly predicting that bounding con- 
straints apply to both kinds of gapping. The treatment in oiir book assumed 
that the semantic interpretation of adjuncts and complements proceeded in es- 
sentially the same way, by reading off tree structure. If we assume this, then it 
follows that a deterministic parser must attach Pi's and other adjunct phrases 
as they are attached by the grammar, in order to carry out semantic interpre- 
tation. However, this assumption is highly dubious. As Miller and Chomksy 
(19G3), Marcus (1980), and many others note, in certain cases, strings of adjunct 
phrases can occur in potentially unlimited configruations. Thus a sequence like 
the man in the house by the river by the woods near the town can have any of 
the following intepretations: 

23 Seo Zubissurrctta (1982) and Stowell (1981). 
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[hi the liouse][by the river [by the woods]] [near the town], 
[in the house [by the river] [by the woods [near the town]]] 
[in the house [by the river [by the woods [near the town]]]] 

A parser that had to do semantic intepretation from tree structure would 
find itself in an exponential regress in such rases. In order to figure out which 
interpretation to give the sentence, it would have to compute the correct syntac- 
tic structure, but in order to do this it has to compute all the possible patterns 
compatible with this string, ;uid then see which one it "means to say." This will 
cause an exponential slowdown in the parsing algorithm, if ail trees must be 
explicitly reconstructed. One classic solution proposed by these authors is that 
adjunct phrases that can be ambiguous (either between adjunct and complement 
readings or between various adjunct readings) should be parsed essentially as 
flat structures. Semantic subroutines can then come in later and decide between 
the possible readings: a procedure that allows us to maintain efficient parsing. 
Put in the context of the gapping constructions, if a parser cannot figure out 
where an adjunct is attached from the local context, it can simply attach it as a 
fiat structure to the lowest node in the parse tree. Then, independently needed 
semantic routines will give this phrase its appropriate semantic interpretation. 
Thus the attachment of adjunct PPs in neither type of gapping can force the 
/*"**S parser to scan left context. Therefore, the attachment of adjunct phrases does 

not incorrectly predict bounding effects in Main Verb Deletion. 

3 Objections to basic assumptions: transparency 
and determinism 

3.1 What is nondeterrninism? 

We'll first analyze the distinction between determinism and nondeterrninism, 
and how Fodor views that distinction. Fodor makes two points: 

1. A nondetcrministic parser, just like a deterministic one, could benefit from 
locality restrictions — if the cost of backup is high. 

2. A deterministic parser cannot recover from error, and so cannot comport 
with what is known about human processing of sentences. 

Nondeterrninistic parsers do not reflect processing complexity 

Let's take these points in turn. First, as we said earlier, one must, distinguish 
between two versions of the nondeterrninism hypothesis: true nondeterrninism, 
where all possibilities are explored in parallel; and simulated nondeterrninism, 
where one possible parse is explored at a time, and backup occurs if one line 
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of attack fails. Only the first version makes the nondeterministic/dctcrministic 
parsing distinction clearcut, and this is the one we chose for comparison. The 
second version of nondeterminism is just like the Marcus model in that a single, 
particular sequence of parsing decisions is made as we move through the sen- 
tence, left-to-right. It is unlike a deterministic model in that revisions in that 
sequence of decision are assumed to occur all the time. 

Fodor does not make the clearcnt choice. Instead, she opts for a determin- 
istic, one-path-at-a-time simulation of true nondeterminism. This position is 
quite weak, because, as Fodor notes, one can turn this simulation into the func- 
tional equivalent of a deterministic parse simply by making the cost of revising 
decisions very high: 

Every point that M. makes could have been made just as well within 
the context of a nondeterministic parser which cared about efficiency. 
(Fodor, page 18) 

Imposing a cost metric on backup, then, gives us more flexibility. But is 
this too much flexibility? There are three basic options. If we say that backup 
costs are zero, then we have in effect the case of true nondeterminism; if we say 
that backup costs are infinite, we have a Marcus model. If we make the costs 
somewhere in between zero and infinite, we get a middle view. 

Fodor takes this as a virtue: all bases are covered. But is this so? Do we 
need at least this three-way split? If one is going to impose a constraint on a 
weaker system that has the functional effect of determinism, it would seem just 
as sensible to start with that constraint i?i \\w. first place: assume the machine 
is deterministic, and see if the required psycholingnistic complexity options can 
be obtained this way. Cutting up the constraints this way makes a difference. A 
"cost" metric is the weaker position, because we must justify the metric we use 
somehow. That is, we must support both the assumption of nondeterminism 
and a particular cost metric. In contrast, a deterministic machine is directly 
built to act as if backtracking costs are very high. There is no separate cost 
metric device in the Marcus parser; therefore we need not justify one. All we 
need to justify is the assumption of determinism, which we must do in any case. 

There could be other grounds for the flexibility allowed by a cost-metric 
addition to the nondeterministic model. In a footnote to her paper, Fodor tries 
to turn the cost-metric model to her advantage, as a way to simulate observed 
human sentence processing. Fodor attempts to equate backtracking cost with 
processing difficulty: 

But it could very well be that that the really severe garden path sen- 
tences . . . are those for which all the wrong(=correct) initial choices 
are reconsidered before the one that was truly at fault. This is 
where the 2" figure woidd approach a realistic estimate of parsing 
time, and it would nicely account for the inordinate difficulty of these 
sentences Thus the striking differences that have been observed 
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in the processing difficulty of natural language sentences are per- 
fectly consistent with the mathematical results for uondetcrministic 
parsing with online backup. 

Fodor is claiming that a garden path sentence such as the horse raced past the 
barn fell demands exponential parsing time because of backup, while relatively 
easier "nongarden path" sentences (such as they told the students that John liked 
that Bill would leave) do not. But it is easy to see that both of these require 
the same amount of backtracking. The problem is that in a direct backtracking 
implementation, backup occurs all the time, even on simple sentences. For the 
first sentence, a backtracking parser must make a decision just before raced, 
between a relative clause and a VP. Assuming frequency preference, it takes 
the VP reading, which fails when fell is encountered. Now it must backup. 
We'll assume the last previous choice; point was before that John. In fact, 
this is not correct. In a pure backtracking parser, we would have to unwind 
to all intermediate choice points: there might be a relative clause after barn; 
there might be an N.P object after raced; and so on. Finally, we arrive at the 
choice at raced and can continue. If the machine can inspect the current word 
it is .scanning, two or three choice points are involved. 24 More backtracking 
correlates with processing difficulty. Even so, such a sentence would not be 
impossibly difficult for a backtracking parser. (And remember that it would 
be perfectly easy for a true nondeterministie parser.) In fact, the backtracking 
parser does not do exponential work on such an example. 

What of the second sentence? Fodor must claim that such a case causes 
little or no backtracking, relative to garden path sentences. But here too, a 
backtracking parser must do a lot of work: before that John liked we call for an 
embedded Sentence instead of a relative; similarly before that Dill. When we 
get to would we must backup. First, we unwind to that Dill and try a relative 
clause reading for it. This fails. Then we backup to the next previous choice 
point, and try alternative categorizations for like. Finally, we arrive at the 
choice between a relative and an embedded S just before that John liked. 25 
Roughly the same backup takes place here as with the "real" garden path. 

Of course, there might be some other parsing scheme to get. us out of this 
particular dilemma. The problem is that any general scheme to make back- 
tracking easy will almost necessarily make the garden path sentences easy as 



24 A "pure" ATN does not. even look at the current word it is scanning in order to make 
a guess about what to do next. But this means that even very simple sentences such as 
Be. careful involve extensive backtracking, because the machine guesses that it will see a 
declarative sentence, then a question, and so forth. This alternative would simply make 
our point even more strongly, so we won't adopt it. 

25 Using standard ATN techniques, preference for one type of phrase type rather than another 
can be encoded by ordering the arcs that leave a network state. One can order the arc 
alternatives so as to take a relative clause push after that, but then this will be wrong and 
fail to account for the preferred cmbedded-S reading of they told Hie students that John liked 
the dory. 
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well. At. licart, a backtracking parser backtracks, and it is quite difficult to use 
ad hoc cost measures to make; it perform otherwise. 

Deterministic parsers can recover from garden paths 

Let's now turn to the second point, about deterministic parsing and error recov- 
ery. While Fodor wants the flexibility to simulate determinism when needed in 
her own model, she denies flexibility for a deterministic parser to recover from 
garden paths: 

The only difference between a deterministic parser and a nonde- 
terministie parser is that in the former a garden path analysis is 
permanent and unrepairable, while in the latter garden paths can 
occur and be recovered from during the parse. (Fodor, page 18) 

But again, as Fodor acknowledges in her footnote 20, this is not to deny 
that there could be specialized deterministic recovery procedures for garden 
path sentences, as suggested by Marcus (1980). For these procedures to apply, 
we would of course toe the hue of determinism: backup along the hues suggested 
by Fodor (or in an ATN) would not be permitted. Ideally, following Marcus's 
definition, the recovery procedure should only be allowed to add information 
about the parse, not wipe out what has already been lc.irned. Instead, when the 
parser blocks (because no known rule applies), a recovery procedure could look 
globally at the state configuration of the parser. Then, by slightly rearranging 
existing subtrees of the parse, the recovery procedure should simply add new 
information about the sentence analysis and come up with the correct sentence 
structure. 

Interestingly enough, the Marcus design, slightly modified, provides the in- 
gredients of just such a theory of garden path sentence recovery. We can only 
sketch the basic idea here. 

Let us consider again the horse raced past the barn fell. When a Marcus-type 
parser fails on such a sentence, it is reading fell. But there is much information 
in its machine configuration — its pushdown stack and input buffer -of value 
for error recovery. It is possible to design a natural recovery procedure that 
uses this information deterministically to build the correct output, though at 
some cost. For example, in the horse raced example, one need only insert 
a new S boundary between horse and raced. There is also room within an 
evaluation metric of recovery to differentiate; between difficult garden paths and 
easy-to-analyze sentences with interpretations. Barton and Berwick (1985) give 
some of the details. Contrary to what Fodor asserts, recovery is possible in a 
deterministic machine. 

3.2 A two-stage design? 

Fodor also takes issue with our division of parsing labor into separate tree- 
building and indexing stages. Again, she makes two basic points: first, that this 
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division is not motivated on grounds of computational efficiency; and second, 
that this division is not motivated by the grammar (so tliat we are violating our 
own assumption of transparency connecting grammar and parser). Again, we 
disagree. 

Consider computational efficiency. Fodor first claims that computational 
reasons alone can't motivate the bounded-context character of our parser: 

Given that the efficiency results for bounded context-parsing are no 
better than for LR(k) parsing in general, the crucial assumption that 
the first stage of B&W's parser is a bounded context device receives 
no support from these efficiency results. (Fodor, page 41). 

But as Fodor herself notes, computational complexity calculations are often 
relative to representational issues. If one picked some other representational 
format, then certain computational issues can become irrelevant. For example, 
if we adopt true nondeterminism, then it is not difficult to parse any sentence of 
a context-free grammar, no matter how ambiguous, in time proportional to the 
square of the grammar size and the cube of sentence length (where the grammar 
is measured in terms of the total number of grammatical symbols, like NP mid 
VP, not just rules. See Earley (1968)). 

This being so, one cannot divorce a discussion about computational effi- 
ciency from representational format. We have chosen to represent the parser's 
knowledge transparently , that is, to include only those categories sanctioned 
by the grammar. The categories of our grammar include only the basic lexical 
projections NP. VP, PP. atul so on. 20 By saying that, our parser works transpar- 
ently, we mean that the parser's rules can only make reference to these literal 
symbols. To put the same point another way, transparency requires that the 
only states the parser has are the "states" — i.e., the nonterminal names — that 
the grammar has. The parser cannot use any derived facts about the grammar; 
nor can it appeal to nonterminal symbols that do not otherwise exist. For ex- 
ample, the parser cannot create a new state in order to "remember" that a wh 
phrase has been encountered earlier in the sentence. This would correspond to 
a complex nonterminal name such as WH/NP. 

In general, LR(k) parsers are allowed to create such states whenever they 
are needed. These states (in the form of a finite-state control table) encode the 
set of possible left-most derivation patterns for the given grammar. Since they 
represent derivation regularities, these states need not map in a 1-1 fashion to 
the nonterminal names of the grammar, and in fact the wh sentence example 
shows that in some grammars the nonterminals do not match the states of the 



26 Like most syntactic theories since Aspects of the Theory of Syntax, we also include traditional 
agreement, features like Person, Number, and Gender, as properties of lexical projections. 
We explicitly do not include the "slash" feature of Generalized Phrase Structure Grammar 
(resulting in complex categories like VP/NP), since this feature is not lexically projected 
(X° or lexical items arc specifically barred from having "slash* 1 features in GPSG). 
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parsing machine. 27 However, we have specifically barred the use of parsing 
states that do not correspond to lexically projected nonterminal names. There- 
fort", our approach does not admit the entire cliiss of LR(k) parsers. Instead, 
our parsing rules can make reference only to grammatical symbols. There is a 
class of deterministic parsers that defines such a class of machines, namely, the 
bounded-context parsers. 28 This is the parsing design we have adopted. 

Fodor is correct that general computational grounds do not force the bounded- 
context choice on us— but that is trivially so. For example, if we adopted a 
more powerful device, such as a nondeternrinistic device, we would not need 
this structure. But, all other things being equal, it is the stronger assumption. 
Transparency is stronger, because we need not posit any entities beyond those 
the grammar already gives us; and all other things are equal, because in this 
case "all other things" is simply parsing efficiency and an account of the psycho- 
logical facts about parsing unbounded dependencies. 29 It is of course true that 
a parser need not respect the representations provided by the grammar. But it 
is simpler to assume that it does. A grammar that contains just projections of 
lexical items is smaller, simpler, and hence easier to learn than one that does 
not. There's a sense in which such a parser is completely lexically based — there 
are just projections of lexical items, and nothing more. 

Fodor also argues that transparency itself does not motivate a literal bounded- 
context parser, because the grammar contains rules that mention variables: "as 
/ 0m \ long as the transformational rules of the competence grammar can contain vari- 

ables (explicit or implicit) we would expect parsing rules employing the same 
metalinguistic vocabulary to do the same." She concludes that we need "an ex- 
plicit prohibition against variables in the parsing rules." (Fodor, page 47). But 
again, there are two parts to any computational operation: the procedure itself, 
and the data structure or representation it works on. In this case, there are 
no variables because there are no complex category symbols, and because the 
rules of the machine are finite. As Fodor notes, these are indeed "stipulations" 
(page 48)- one must always assume something in arguments about computa- 
tional matters, since we don't have the luxury of neurophysiological findings. 
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27 This transparency distinction also shows up in the way that Llt(k) parsers arc built. The 
usual approach is to process an LR(k) grammar to derive a finite-state control tabic that is 
actually used for par-mi;;. The states of this table need not, and usually do not, correspond 
in any transparent way lo individual nonterminal names. Instead, in effect they stand for 
theorems about derivations in a particular grammar. By banning such uontransparency, we 
are banning such preprocessing. 

28 See Floyd (19G4). Actually, we must define an extension of the bounded-context p.'iracrs that 
uses nonterminal lookahead as the Marcus machine docs. For details, see Berwick (1985). 
We could also vary other details of the bounded-context design, as long as we retain the 
key feature: parsing rides must refer only to grammatical symbols, not to parsing states. 

29 To make the same point in reverse, the only evidence for the more powerful machinery of 
a hold cell or "slashed'' categories seems to be the ability to parse unbounded dependen- 
cies. But if this can be explained without resort to such machinery, then this leaves its 
justification unestablished. 
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Similarly, Fodor "stipulates" that a grammar allows machinery beyond basic X 
categories, and that the parser includes backtracking as a standard feature. The 
question is how natural these stipulations are. In fact, in Government-Binding 
theory, the rule Move a does not have variables (Chomsky 1977, 1981 is quite 
explicit on this point). Deletions, on the other hand, can have variables, but 
this is not relevant for parsing because deletions are locally unambiguous (see 
the previous section on Gapping and Berwick and Weinberg (1984)). 

Beyond this question of bounded-context parsing, Fodor then goes on to 
question our division of parsing into two stages at all. She again claims that we 
violate our own criterion of transparency and that such a division is not needed 
on grounds of efficiency. 

The efficiency counterargument, at least in one form that Fodor gives, goes 
something like this. Our second stage procedure that, computes referential 
dependencies — that John and he may denote the same person in sentences like 
this: 

John,- believes that Fred thinks that Sue said that he; is smart. 

Since this procedure, whatever it is, must be able to search unbounded 

domains, why not just let it do the job of searching for the antecedent of a wh 

phrase? Alternatively, why not just fold the two stages together, combining both 

jobs into one? In effect, Fodor wants to "multiply out" the two representational 

r^ levels we have distinguished into a single one because this is more efficient. 30 

Since Fodor elsewhere (Crain and Fodor 1984) has herself argued for the 
computational benefits of nonmodular representations, it is worthwhile to see 
just what is at stake here. Fodor 's support for nomnodularity is surprising. 
First of all, from the standpoint, of computer science generally, it cuts against 
the grain of all that is known about the efficient solution of complex problems. 
(See, e.g., standard works on algorithms, such as Knuth, 1973; Aho, Hopcroft 
and Ullman, 1974.) Second, the key point is that for modularity to work the 
distinct levels should have different representational properties, because each 
is designed to highlight different aspects of the same problem. This is the 
source of the power behind the idea of two levels of representation, words and 
phrases. It is easier to state the facts about agreement if we use Noun Phrases 
and Verb Phrases rather than simple words, because; then we have just two 
simple representational units adjacent to one another (NP next to VP). In fact, 
a simple finite-state automaton suffices, given that the phrases are constructed 
first. Similarly, there are facts about, language that are more easily stated in 
terms of a linear arrangement of words- e.g., that a Determiner precedes a Head 
Noun, and may agree with it. This (oversimplified) factored representation 

30 At times, Fodor suggests just the opposite, as when she proposes that the first and second 
states might to divide computational labor between them: "the first stage device might 
call on the second-stage device to do the antecedent check prior to trace population. This 
nught call for a slightly more complicated routine to pass control back and forth between 
the two, but the labor saved could very well compensate." (Fodor, page 43) 
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can be modeled as a cascade of finite-state transducers, where the first level 
system, that of words, builds a phrasal representation and feeds the second 
level. Is it possible to collapse these two levels into one? Yes: one can "multiply 
out" all combinations of words and eliminate the phrasal level, by forming the 
product of the two finite-state machines representing each level (see Berwick 
1982). However, it does not make sense to collapse these two levels into one. 
The collapsed representation is much larger, because all possible combinations 
of constraints, previously independently expressed at each level, are now written 
out explicitly. The resulting system is much larger. In general, if the constraints 
on one level can be expressed by a machine of size n, and the constraints on 
a second level can be expressed by a machine of size m, then the collapsed 
machine could be of size nm. In fact, this is one traditional argument for 
a multiple-levels view of language, as initially expressed in Chomsky's Logical 
Structure of Linguistic Theory. There are two computational advantages to the 
modular view: one, just mentioned, is that the resulting system is easier to 
learn, if we equate smaller size with easier learning; the second is that we can 
design computational procedures tailored to work with the specific formats of 
each level. 

This is exactly what we ;umcd for in our two-stage model. Each level has a 

different representation that highlights different aspects of the computation of 

linguistic structure, and each is designed to ease the computation of properties 

/*""*\ relevant to that level. The first level deals with questions of how to. build a 

tree, and uses notions like dominate, precede. For example, in the sentence 
example we gave just above we expand the tree in exactly tlie same way no 
matter whether he is bound to Fred or whether it is a. free pronoun bound to a 
discourse NP that occurred much earlier. This contrasts with cases governed by 
Subjacency. The presence or absence of an antecedent tells us how to expand 
the tree we are building. If there is an antecedent in the structure mid a verb 
that selects or subcategorizes for an NP, we create a trace slot in the phrase 
structure; otherwise, we do not. This is a decision about tree structure. 

Roughly speaking, referential dependencies can cut across sentences and 
involve all the objects mentioned in a discourse— plainly outside the purview 
of sentence tree predicates. Secondly, referential dependencies are calculated 
on a different representational base from phrase structure, just as Subject Verb 
agreement is calculated at the level of phrases rather than words. 

What would happen if we tried to collapse the referential dependency calcu- 
lation together with tree-building is exactly what would happen if we tried to 
compute Subject- Verb agreement at the level of words. As we show in our book 
(Berwick and Weinberg 1984), our first stage procedure works in linear time, 
in time en, where c is a constant depending on the size of the output phrasal 
structure and the size of the grammar, mid n the length of input sentences. 

31 For more realistic representational formats, e.g., context-free grammars, the savings can be 
even larger. See Berwick 1982 for details. See the next, section for additional comments on 
tins problem and grammar size. 
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The search for referential antecedents would now have to look at a represen- 
tation defined over complex tree shapes, including many irrelevant structures. 
We note in our book that in the worst case this would increase analysis time 
to kn 2 , where n is the length of the input sentence, and k is some constant 
that depends on the size of the phrase description. It is already apparent that 
pronoun referential dependency can extend across sentences. It is also apparent 
that this computation can be nonlinear: consider the laborious calculation that 
seems to occur when one uses a pronoun whose antecedent lies many sentences 
behind in a discourse. What Fodor wants to do by combining these two steps 
is make the first stage procedure nonlinear as well. But as she herself notes 
(page 68: "in general, linear time parsing is surely just what a model of the 
human sentence processing mechanism should aim for"), this would have the 
unfortunate effect of making the construction of tree structure for single sen- 
tences potentially nonlinear. We want to avoid this. We would like 1o recover 
the right tree structure in linear time, even if the pronoun antecedents are not 
in place. Note that there is much we can interpret about a sentence if we have 
its correct phrase structure, even if we do not know that he is dependent on an 
earlier NP. Fodor 's collapsed scheme in effect forces the machine to stop and 
wait for the right antecedent calculations to complete before plunging on. 32 

By factoring apart the stages of tree-construction and referential dependency 

calculation, we gain at the second stage as -well because the size of the structures 

/"""""N the search procedure works over can be made smaller. That is, instead of 

running our procedure in time en 2 , where c is large, we can run it in time 
kn 2 , where k is a short list of NPs. As we noted in our book, this is a difficult 
argument to make because in most cases sentences arc short. But let us see what 
it means in detail. The second-stage representation includes shunted predicates 
and NPs. It is a simple matter to take this propositional representation and 
build a finite-state transducer (standing for a homomorphism) that projects just 
the NPs from this second list. We may imagine this projected bag of NPs to 
be the discourse NPs for this sentence; it could include, perhaps, the NPs for 
previous sentences— but just NPs. It is because we have now isolated these 
units on a separate level that the search for referential dependents is easier. No 
other units stand in the way of a direct search through the NP list. In most 
cases, there will be only a few NPs to look at. Note that this method only works 
because we have set up the first stage to build just the right structured list so 
as to provide the right NPs to look through. Further, in those cases where 
the list is large, we expect, to find nonlinear processing difficulty - informally 
at least, precisely what seems to happen when there are many potential NP 
antecedents. 33 

32 0nc could design a "pipelined" scheme where a second-stage referential dependency calcu- 
lation works off the input from a first-stage device. But this is just our two-stage model in 
another guise. 

33 That is, a linear list of this kind, if long enough and if it included discourse NPs, might 
take linear time to search for any single NP. Of course, there are other possibilities, since 
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To summarize, we argue that isolating the referential dependency calculation 
in this way pinpoints an important functional distinction between building tree 
structure mid referential dependency. Tree construction is fast (linear time, and, 
in fact, realtime if one examines our procedure in detail): each phrase is built in 
a bounded amount of time; coindexing (or referential dependency calculation) 
does not interfere with this, for it can be nonlinear. Fodor's proposed one-stage 
model, because it interweaves these functionally distinct processes, slows both 
down. 

3.3 Another source for locality principles? 

Finally, Fodor contends that locality principles could be motivated in a GPSG- 
type theory, both on grounds of easy parsability, and another point that we 
ourselves note— on grounds of learnability: 

This negative result does not mean that subjacency could not be 
functionally grounded in a GPSG. As chapter 3 observed, there are 
many possible "functional" constraints that could have played a role 
in the shaping of language. Foremost among these, at least tradi- 
tionally, is learnability. (Berwick and Weinberg 198-1:166) 

Fodor makes two specific proposals along these hues, one for parsability, and 
one for parsability /learnability. Let's take each in turn. 
' • Consider first her argument thai, a GPSG parser would benefit, from locality 

constraints resolved by context on the right, in sentences such as Who did you 
help ..., where the parser must decide whether to insert a trace after help 
or keep going so that the trace will appear in some lower complement. But 
once again, this constraint just doesn't matter under the true nondeterministic 
model. Advocates of GPSG often cite the parsing results for general context-free 
grammars as evidence that such a system will work efficiently. But then, Fodor's 
demand for constraints on context become more mysterious. Suppose one uses 
Barley's parser for context-free grammars. This is one standard algorithm on 
which the efficiency results for generalized phrase structure grammar are often 
based. Then all parses arc kept in parallel, and there's no problem at all: both 
alternatives are carried along, and when the problematic gap appears or fails to 
appear, one of the possibilities falls by the wayside. There is no reason that the 
locality constraint must exist. The point is not that the GPSG parser cannot 
be made to benefit from a locality constraint but that it doesn't need to benefit 
from a locality constraint in the right-context situation. 34 

not much is known about Hie representation of semantic structures. For example, it could 
be that such NPs can be accessed in constant, lime, up to a certain memory limit— ns if one 
could instantly remember the last 10 things mentioned. If so, then processing difficulties 
might not show up on short sentences. Like so many other details about; processing, this 
one hinges on representational questions that we cannot answer in detail as yet. 

34 Alternatively, one could dispense with the Earlcy algorithm and come up with some other 
parsing algorithm for these systems. But then it, remains to establish that this alternative 

27 



/*^*H 



What about our trace-based parser, then? Why can't we add similar par- 
allelism and thus avoid the need for a locality constraint? Remember that our 
parser design does not have complex categories such as S/NP, VP/NP, and so 
on; it can use just the unalloyed categories provided by X theory. It does not 
use a hold cell, or any other special memory. Given these transparency con- 
straints, it is interesting that while true nondeterminism will make a locality 
constraint for right-disambignating contexts superfluous, it actually leaves the 
demand for Subjacency unscathed. Consider what happens if we had a true 
non deterministic, trace-based analysis of sentences such as. What did Mary say 
... that John ate?. Note that the analysis is completely determined up to the 
point that, the "yap" after eat is encountered. That is, the parser is not car- 
rying along two analyses at this point, as it is in the right-context case. At 
ate the parser takes the nondetenninistic solution: it writes out one parse with 
the trace inserted, and one with it not inserted. But now what? The sentence 
ends. No additional information is forthcoming, and yet there are still two vi- 
able analyses of the sentence. One of these is grammatical (where the trace is 
inserted) and the other is not. ambiguous. But the sentence is not interpreted 
as having two analyses, one grammatical, one not. There is no evident way to 
force the other reading out. Thus, the nondetenninistic analysis actually makes 
things worse here: it yields two candidate interpretations when only one will 
suffice. To resolve these, we must now rescan the output analysis tree, to pick 
f\ up whether a wh was present- -adding to the computational cost. Rightrcontext 

won't help us here, because there is no right-context. But there's r.o evidence 
that this rcanalysis occurs, or that such a sentence is hard to process. We con- 
dude that nondeterminism does not help us if we have only the categories S, 
NP, VP, etc. and no Subjacency ; on the contrary, it hurts. Thus, Subjacency 
is still predicted in our model, unlike Fodor's. Note that this is quite unlike 
the right-disambiguating context case, where pursuing alternatives in parallel 
allowed us to hold off making a decision until information became available. 

What about the; second proposal, about learning? Just before her conclusion, 
Fodor suggests that a GPSG system might need locality constraints to make its 
rule system smaller, hence more easily parsable, and, as suggested in the other 
papers where she has advanced this proposal (Fodor 1984) more learnable. 

In the absence of any details about just how easy or hard it is to parse a 
full-scale derived rule system, it is difficult to judge this proposal. We must 
first emphasize that Fodor here is talking about a grammar that explicitly list3 
possible phrase structure patterns rule by rutc. This is rather different from the 
current GPSG framework that represents a grammar via a set of dominance and 
precedence statements (ID/LP format) for basic phrasal relationships, implica- 
tional statements to encode feature redundancies, and metarules to account for 
systematicities like active-passive sentences (Gazdar, Klein, Pullum, and Sag, 
1985). What one finds is that in any reasonably full-scale grammar, for, say, 

parsing method — whatever it is — is efficient. Fodor does not offer a concrete alternative. 
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English, the exi>licit rule system is so largo that there's only marginal gain in 
"reducing" the size of an explicit rule system in the manner Fodor suggests. 
This is because the reduction is miniscule compared to the total overall size of 
the rule systems themselves. Let's see why this is so. 

To begin, we must be precise. Since Fodor wants to make an argument about 
improving parsing efficiency by reducing grammar size, let us define grammar 
size, \G\, as the total number of symbols in the grammar accessed for parsing. 
This is the standard measure. (See Earley 19C8 for discussion.) We do not 
want to use the total number of individual rules of the grammar, because this 
would weight against rule systems with "short" rules (e.g., A— *B0; B-»DEF as 
opposed to A->DEFC). 

Let us now compare the grammar size of an explicit phrase structure rule 
system that allows a one-S extraction constraint vs. one that allows extraction 
across three S's. Elsewhere? (Fodor 1984), Fodor has suggested this as an exam- 
ple of the benefits of constraints: the tighter the constraints on extraction, the 
fewer the rules. While this is literally true, the problem is that such a gram- 
mar is already so large that any minor effect imposed by one new constraint is 
swamped out. 

ft is of course quite difficult to know what the "true" grammar size for 
such a system is, because we do not know what the "true" grammar of any 
natural language is, even of English. However, wc can say this much: any such 
explicit rule system must have a rule for every possible surface phrase structure 
pattern. How many such patterns are there? Perhaps the most systematic shtdy 
of such patterns has been carried out in the context of Sagcr's work (1981). 
For instance, Hobbs (1974) estimates that a subpart of the Sager grammar, 
when expanded out into a context-free form, would be "about several orders 
of magnitude larger" than the 200 productions and 300 context restrictions it 
contains in context-sensitive form (1974:132). That is, the expanded grammar 
size would be have about 20,000 CO, 000 context-free rules . 35 We take this as a 
fairly conservative estimate of the number of explicit, rule-by-rule descriptions 
of phrase structure patterns in English. 30 

The Earley algorithm runs in time at most |(?| 2 n 3 , where n is the sentence 
length in tokens. That is, using the Earley algorithm with a fully-expanded, 
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36 The initial grammar's productions are in Chomsky normal form, anil therefore have a size of 
3 per production. Thus the initial grammar size is about COO, with 300 context restrictions. 

3<i Note that most grammatical descriptions that appear in the computational literature in 
fact describe only small fragments of natural languages — quite reasonably, since they are 
often designed to illustrate one or ai.other theoretical point, or work within a sublanguage 
that serves some functional end (like database retrieval); they are not designed for broad 
coverage. For instance, the example OPSG system described by Gavvron, King, Lamping, 
Loebner, Paulson, Pullnm, Sag. and Wasow, 1982 for database retrieval has an expanded 
grammar she. of about, 1500 1800 (1982:77), but does not include many sentence types and 
restrictions of the Sager grammar. For instance, appositives and sentence adjuncts of many 
different types ;ire not included (little did she know ttiat . . .; Whatever you say, the guy, the very 
same person you saw yesterday, is . . .). 
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explicit rule system for English, the running time would he at worst 1.6 X 
10°n 3 , or about a billion Xn 3 . The result is that any change brought about by 
introducing a constraint on extraction across one S rather than, say, three, is 
irrelevant. The base grammar with three- S extraction will need two or three 
extra nonterminal symbols, in order to "count" how many S's have been crossed 
(Si, S2, S3). Suppose this adds 50 new rules. What happens to parsing time? 
It is "exploded" from 1.5 billion n 3 to 2.-1 billion n 3 ~ an increase, to be sure, 
but one that cannot possibly matter, because the constant factor is already so 
large. 

We do not mean to take this as a serious calculation; it is quite speculative. 
However, the quantitative point still stands. This exercise is simply designed 
to demonstrate that an explicit rule system doesn't exhibit the right kind of 
demarcation between one and more than one that is so characteristic of natural 
languages. Details about grammar size aside, if extraction across two domains 
does not lead to a processing burden, then it is hard to say why three rather than 
four or five domains does. Any system grounded on explicit phrase structure 
rules does not naturally distinguish between a locality condition that acts over, 
say, three, domains and one that acts over a single domain. We just saw that there 
could be no relevant difference for parsing, or for learning (if we equate size of 
rule system with difficulty of learning). But we suspect that this simply misses 
an important property of natural grammars: namely, that they do not have 
"counting" predicates that distinguish between two or three, or 17 domains.' 
This is evidently a property of grammars generally, and has some power in 
explaining the metrical structure of phonological rale systems (see Halle and 
Vergnaud forthcoming 1985). But why do grammars have this property? If we 
assume that rule systems arc written in a derived fashion, as Fodor insists, then 
there is no reason for it. A grammar that, counts to 16 is just as easily parsed 
and just as easily learned as one that does not. 

Suppose, in contrast, that there arc no phrase structure rules — no explicit 
derived rules at all. Instead, suppose that there are just individual lexical items 
and their feature projections (as defined by X theory), plus the movement rules 
and constraints defined by GD theory. Now there cannot be any rule of grammar 
that cuts across just three S domains. Individual lexical items can subcategorize 
for single S's, and hence build phrases consisting of adjacent S domains. Since 
movement can apply, we can move elements across these domains. Cyclicity 
(iteration of this process) leads to superficially unbounded movement. But no 
other constraints car. even he stated. The vocabulary for writing down grammars 
cannot refer to phrase structure rules, and so cannot write down a chain of 
three S expansions to allow extraction across three S's but not four. As we 
observed in our book, either free (unbounded) movement is possible, or else 
movement across a single category is blocked; nothing in between is allowed. 
This result ;- the noncounting evidently true of natural grammars — follows from 
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the nonexistence of derived phrase structure rules. 37 

Of course, nondetermiriism and the flexibility allowed in writing derived 
grammars leaves open many possibilities. As we have seen, this is exactly what 
is wrong with a weak set of hypotheses: it leaves open too many avenues to 
explore. As we said at the outset, we prefer to tackle the problem head on, 
by adopting strong constraints that lead to interesting predictions and expla- 
nations of why natural grammars arc built the way they are. giving up those 
constraints only when absolutely necessary. So far, we've been encouraged by 
the results. Our predictions about locality principles, suitably revised, hold up. 
Our modular design leads to testable hypotheses about the role of c-command 
in language processing, now being probed (Weinberg and Garrett, forthcom- 
ing). Our transparency assumption leads to noncounting grammars. We see no 
reason to abandon the chase now, when we have come so far. 



37 As for as we can tell, this property also holds in current GPSG frameworks that avoid 
explicit phrase structure rules and use subcategorization and ID/LP statements instead 
to define a set of admissible phrase structures. Thus this version of GPSG also obeys 
noncounting. 
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